Existing object detection methods are bounded in a fixed-set vocabulary by costly labeled data. When dealing with novel categories, the model has to be retrained with more bounding box annotations. Natural language supervision is an attractive alternative for its annotation-free attributes and broader object concepts. However, learning open-vocabulary object detection from language is challenging since image-text pairs do not contain fine-grained object-language alignments. Previous solutions rely on either expensive grounding annotations or distilling classification-oriented vision models. In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data. We formulate object-language alignment as a set matching problem between a set of image region features and a set of word embeddings. It enables us to train an open-vocabulary object detector on image-text pairs in a much simple and effective way. Extensive experiments on two benchmark datasets, COCO and LVIS, demonstrate our superior performance over the competing approaches on novel categories, e.g. achieving 32.0% mAP on COCO and 21.7% mask mAP on LVIS. Code is available at: https://github.com/clin1223/VLDet.
translated by 谷歌翻译
现有的步态识别研究以实验室场景为主。由于人们生活在现实世界中,因此野外的步态识别是一个更实用的问题,最近引起了多媒体和计算机视觉社区的关注。在现有基准上获得最先进性能的当前方法在最近提出的野外数据集上的准确性差得多,因为这些方法几乎无法模拟不受约束场景中步态序列的各种时间动力学。因此,本文提出了一种新型的多跳时间开关方法,以实现实际场景中步态模式的有效时间建模。具体来说,我们设计了一个新型的步态识别网络,称为多跳临时交换机网络(MTSGait),以同时学习空间特征和多尺度的时间功能。与现有的3D卷积进行时间建模的方法不同,我们的MTSGAIT通过2D卷积对步态序列的时间动力学进行建模。通过这种方式,与基于3D卷积的模型相比,它以较少的模型参数来达到高效率,并减少了优化的难度。基于2D卷积内核的特定设计,我们的方法可以消除相邻帧之间特征的不对准。此外,提出了一种新的采样策略,即非环保连续采样,以使模型学习更强大的时间特征。最后,与最新方法相比,提出的方法在两个公共步态数据集(即增长和步态3D)上取得了出色的性能。
translated by 谷歌翻译
作为一种主动网络安全保护方案,入侵检测系统(IDS)承担以恶意网络流量形式检测网络攻击的重要责任。入侵检测技术是ID的重要组成部分。目前,许多学者已经对入侵检测技术进行了广泛的研究。但是,为大规模网络流量数据开发有效的入侵检测方法仍然很困难。由于生成的对抗网络(GAN)具有强大的建模功能,可用于复杂的高维数据,因此它们为解决此问题提供了新的想法。在本文中,我们提出了一种基于Ebgan的入侵检测方法IDS-Ebgan,该方法将网络记录归类为正常流量或恶意流量。 IDS-Ebgan中的发电机负责将培训中的原始恶意网络流量转换为对抗性恶意示例。这是因为我们想使用对抗性学习来提高歧视者检测恶意流量的能力。同时,鉴别器采用自动编码器模型。在测试过程中,IDS-Ebgan使用歧视器的重建错误来对流量记录进行分类。
translated by 谷歌翻译
虽然RGB-Infrared跨型号人重新识别(RGB-IR Reid)在24小时智能监测中启用了巨大进展,但最先进的仍然严重依赖于微调想象的预先训练的网络。由于单模性质,这种大规模的预训练可以产生逆向模态图像检索性能的RGB偏置的表示。本文介绍了一个自我监督的预训练替代品,命名为模态感知多个粒度学习(MMGL),该学习(MMGL)直接从划痕上培训模型,而是在没有外部数据和复杂的调整技巧的情况下实现竞争结果。具体而言,MMGL将RGB-IR图像映射到共享潜在置换空间中,通过最大化循环 - 一致的RGB-IR图像补片之间的协议,进一步提高了局部辨别性。实验表明,MMGL在更快的训练速度(几小时内收敛)和求解数据效率(<5%数据大小)比想象预先训练更好地了解更好的表示(+ 6.47%的秩1)。结果还表明它概括为各种现有模型,损失,并且在数据集中具有有希望的可转换性。代码将被释放。
translated by 谷歌翻译
复杂网络包含完整的子图,例如节点,边缘,三角形等,称为不同订单的简单和批分。值得注意的是,由高阶派系组成的空腔在大脑功能中起重要作用。由于搜索最大批变是一个NP完整的问题,因此我们使用K-Core分解来确定给定网络的可计算性。对于可计算的网络,我们设计具有可实现的算法的搜索方法,用于查找不同订单的Cliques,还获得欧拉特征数。然后,我们通过使用相邻派系的边界矩阵的级别来计算Betti号。此外,我们设计了一种用于查找不同订单的空腔的优化算法。最后,我们将该算法应用于来自一个典型数据集的数据的C.杆杆线虫的神经元网络,并找到其所有不同订单的群体和一些空腔,为其结构和功能提供了进一步的数学分析和计算的基础。
translated by 谷歌翻译
最近的智能故障诊断(IFD)的进展大大依赖于深度代表学习和大量标记数据。然而,机器通常以各种工作条件操作,或者目标任务具有不同的分布,其中包含用于训练的收集数据(域移位问题)。此外,目标域中的新收集的测试数据通常是未标记的,导致基于无监督的深度转移学习(基于UDTL为基础的)IFD问题。虽然它已经实现了巨大的发展,但标准和开放的源代码框架以及基于UDTL的IFD的比较研究尚未建立。在本文中,我们根据不同的任务,构建新的分类系统并对基于UDTL的IFD进行全面审查。对一些典型方法和数据集的比较分析显示了基于UDTL的IFD中的一些开放和基本问题,这很少研究,包括特征,骨干,负转移,物理前导等的可转移性,强调UDTL的重要性和再现性 - 基于IFD,整个测试框架将发布给研究界以促进未来的研究。总之,发布的框架和比较研究可以作为扩展界面和基本结果,以便对基于UDTL的IFD进行新的研究。代码框架可用于\ url {https:/github.com/zhaozhibin/udtl}。
translated by 谷歌翻译
Optimization in multi-task learning (MTL) is more challenging than single-task learning (STL), as the gradient from different tasks can be contradictory. When tasks are related, it can be beneficial to share some parameters among them (cooperation). However, some tasks require additional parameters with expertise in a specific type of data or discrimination (specialization). To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad'). This structure allows us to formalize cooperation and specialization as the process of matching experts and tasks. We optimize this matching process during the training of a single model. Specifically, we incorporate mixture of experts (MoE) layers into a transformer model, with a new loss that incorporates the mutual dependence between tasks and experts. As a result, only a small set of experts are activated for each task. This prevents the sharing of the entire backbone model between all tasks, which strengthens the model, especially when the training set size and the number of tasks scale up. More interestingly, for each task, we can extract the small set of experts as a standalone model that maintains the same performance as the large model. Extensive experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
translated by 谷歌翻译
We present a new method for generating controllable, dynamically responsive, and photorealistic human animations. Given an image of a person, our system allows the user to generate Physically plausible Upper Body Animation (PUBA) using interaction in the image space, such as dragging their hand to various locations. We formulate a reinforcement learning problem to train a dynamic model that predicts the person's next 2D state (i.e., keypoints on the image) conditioned on a 3D action (i.e., joint torque), and a policy that outputs optimal actions to control the person to achieve desired goals. The dynamic model leverages the expressiveness of 3D simulation and the visual realism of 2D videos. PUBA generates 2D keypoint sequences that achieve task goals while being responsive to forceful perturbation. The sequences of keypoints are then translated by a pose-to-image generator to produce the final photorealistic video.
translated by 谷歌翻译
This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity.
translated by 谷歌翻译
Contour-based instance segmentation methods include one-stage and multi-stage schemes. These approaches achieve remarkable performance. However, they have to define plenty of points to segment precise masks, which leads to high complexity. We follow this issue and present a single-shot method, called \textbf{VeinMask}, for achieving competitive performance in low design complexity. Concretely, we observe that the leaf locates coarse margins via major veins and grows minor veins to refine twisty parts, which makes it possible to cover any objects accurately. Meanwhile, major and minor veins share the same growth mode, which avoids modeling them separately and ensures model simplicity. Considering the superiorities above, we propose VeinMask to formulate the instance segmentation problem as the simulation of the vein growth process and to predict the major and minor veins in polar coordinates. Besides, centroidness is introduced for instance segmentation tasks to help suppress low-quality instances. Furthermore, a surroundings cross-correlation sensitive (SCCS) module is designed to enhance the feature expression by utilizing the surroundings of each pixel. Additionally, a Residual IoU (R-IoU) loss is formulated to supervise the regression tasks of major and minor veins effectively. Experiments demonstrate that VeinMask performs much better than other contour-based methods in low design complexity. Particularly, our method outperforms existing one-stage contour-based methods on the COCO dataset with almost half the design complexity.
translated by 谷歌翻译